Parsing Noun Phrase Structure with CCG
نویسندگان
چکیده
Statistical parsing of noun phrase (NP) structure has been hampered by a lack of goldstandard data. This is a significant problem for CCGbank, where binary branching NP derivations are often incorrect, a result of the automatic conversion from the Penn Treebank. We correct these errors in CCGbank using a gold-standard corpus of NP structure, resulting in a much more accurate corpus. We also implement novel NER features that generalise the lexical information needed to parse NPs and provide important semantic information. Finally, evaluating against DepBank demonstrates the effectiveness of our modified corpus and novel features, with an increase in parser performance of 1.51%.
منابع مشابه
Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation
Flat noun phrase structure was, up until recently, the standard in annotation for the Penn Treebanks. With the recent addition of internal noun phrase annotation, dependency parsing and applications down the NLP pipeline are likely affected. Some machine translation systems, such as TectoMT, use deep syntax as a language transfer layer. It is proposed that changes to the noun phrase dependency ...
متن کاملStatistical parsing of noun phrase structure
Noun phrases (NPs) are a crucial part of natural language, exhibiting in many cases an extremely complex structure. However, NP structure is largely ignored by the statistical parsing field, as the most widely-used corpus is not annotated with it. This lack of gold-standard data has restricted all previous efforts to parse NPs, making it impossible to perform the supervised experiments that hav...
متن کاملA CCG-based Quality Estimation Metric for Statistical Machine Translation
We describe a metric for estimating the quality of Statistical Machine Translation (SMT) output based on syntactic features extracted using Combinatory Categorial Grammar (CCG). CCG has been demonstrated to be better suited to deal with SMT texts than context free phrase structure grammar formalisms. We use CCG features to estimate the grammaticality of the translations by dividing them into ma...
متن کاملSurvey:Parsing and Parallelization
Parsing is a process of building structure onto a sentence or other string of characters so that the meaning of the sentence or string can be derived. Consider a simple English sentence: ”the boy hit the ball.” Thinking back to elementary school English class, the sentence can be broken into parts of speech as per Figure 1: an article followed by a noun, then a verb, then an article, then a nou...
متن کاملPerceptron Training for a Wide-Coverage Lexicalized-Grammar Parser
This paper investigates perceptron training for a wide-coverage CCG parser and compares the perceptron with a log-linear model. The CCG parser uses a phrase-structure parsing model and dynamic programming in the form of the Viterbi algorithm to find the highest scoring derivation. The difficulty in using the perceptron for a phrase-structure parsing model is the need for an efficient decoder. W...
متن کامل